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Abstract 

We study the problem of sampling high and infinite dimensional target 
measures arising in applications such as conditioned diffusions and inverse 
problems. We focus on those that arise from approximating measures on 
Hilbert spaces defined via a density with respect to a Gaussian reference 
measure. We consider the Metropolis-Hastings algorithm that adds an 
accept-reject mechanism to a Markov chain proposal in order to have the 
target measure as an ergodic invariant measure. We focus on cases where 
the proposal is either a Gaussian random walk (RWM) with covariance 
equal to that of the reference measure or an Ornstein-Uhlenbeck proposal 
(pCN) for which the reference measure is invariant. 

Previous results in terms of scaling and diffusion limits suggested that 
the pCN has a convergence rate that is independent of the dimension while 
the RWM method has undesirable dimension-dependent behaviour. We 
confirm this claim by showing dimension-independent Wasserstein spec- 
tral gap for pCN algorithm for a large class of target measures. In our 
setting this Wasserstein spectral gap implies an L^-spectral gap. We use 
both spectral gaps to show that the ergodic average satisfies a strong law 
of large numbers, the central limit theorem and non-asymptotic bounds 
on the mean square error, all dimension independent. In contrast we show 
that the RWM algorithm applied to the reference measures degenerates 
as the dimension tends to infinity. 

1 Introduction 

The aim of this article is to study the complexity of certain sampling algorithms 
in high dimensions. Creating samples from a high dimensional probability distri- 
bution is important for Bayesian Inverse Problems [35^ and Bayesian Statistics 
[32, . In Bayesian nonparametrics [19 , which have recently become more and 
more important for applications, these are the main tools for extracting infor- 
mation from the posterior. Last but not least our results are applicable to a 
certain class of conditioned diffusions [23j . 

The most widely used method for general target measures are Markov chain 
Monte Carlo (MCMC) algorithms which run an ergodic Markov chain with the 
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target measure as the invariant measure. Under certain conditions the empirical 
average of a function / (observable) applied to the steps of the Markov chain 
converges to the expectation of this function with respect to the target measure. 
The computational cost of such an algorithm is the product of the cost of one 
step and the number of steps necessary for a certain level of accuracy. While in 
most applications the cost of one step grows with dimensionality, a major result 
of this article is to show that under certain conditions an upper bound on the 
number of steps which are necessary is independent of the dimension. 

For ease of presentation we work on a separable Hilbert space (H, ||-||) 
equipped with a mean-zero Gaussian reference measure 7 with covariance opera- 
tor C. Let {i/SnlnGN be an orthonormal base of eigenvectors of C corresponding to 
eigenvalues {A^}„gN. Thus 7 can be written as its Karhunen-Loeve Expansion 

(c.f. ID) 

00 

7 = \e,ii), where ''^^ 1) 

i=l 

where £(•) denotes the law of a random variable. The target measure /i is 
assumed to have a density with respect to 7 of the form 

^ = M cxp(-$(x))7. (1.1) 

Gaussian measures have the property that there are always many Hilbert spaces 
which satisfy ^{H) = 1. We will assume that $ : H — > i? is Lipschitz and that 
the reference measure 7 has the property that 7(i/) = 1. For Bayesian problems 
this amounts to the choice of prior; for conditioned diffusions it restricts the class 
of admissible target measures. With P,,,, the projection onto the first m basis 
elements we consider the following m-dimensional approximations to 7 and 

m 

7m = 'C(^Aiei^i) 

i=l 

Ilyn = M„exp(-$(F„a;))7m. (1.2) 

The approximation error, namely the difference between fj, and is already 
well studied (ITS', TO] for example) an can be estimated in terms of the closeness 
between $ o P„j and $. 

In this article we consider Metropolis-Hastings MCMC methods ([36^ and 
|24' ) . For an overview of other MCMC methods, which have been developed 
and analyzed, consult |331[33. The idea of the Metropolis-Hastings algorithm is 
to add an independent accept-reject mechanism to a Markov chain proposal in 
order to have the target measure as an ergodic invariant measure. We denote by 
Q{x,dy) the transition kernel of the underlying Markov chain and with a{x,y) 
the acceptance probability for a proposed move from x to y. The transition 
kernel of the Metropolis-Hastings algorithm reads 

V{x^dz) ~ Q{x,dz)a{x, z) + 5.j.{dz) {1 — a{x,uj)Q{x,du) (1-3) 
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where a{x,y) is chosen such that 'P{x,dy) is reversible with respect to ji. Ac- 
cording to |Sn], one considers v = iJ.{dx)Q{x,dy) and i/^ = fi{dy)Q{y,dx) on 
a subset where they are mutually absolutely continuous and there one takes 
a{x, y) = 1 Ar(x, y) with r — on the complement of this subset a[x, y) = 0. 
A common proposal kernel corresponds to the random walk 



Q{x,dy)^ C{x + ^25^) 
with ^ 7,„ which leads to the acceptance probability 

a{x,y) = lh[^{x)-<^{y) + ]^{x,Cx)-]^{y,Cy)^. (1.4) 

Notice that the quadratic forms ^{x^Cx) and ^{y,Cy) are almost surely infi- 
nite in H since they correspond to the Cameron-Martin norm of x and y re- 
spectively. For this reason the RWM algorithm is not defined on the infinite 
dimensional Hilbert space (see [11] for a discussion) and we will study it only 
on m— dimensional approximating spaces. Furthermore, it is intuitive that the 
algorithms we study will degenerate in some way as the dimension m increases. 
In this article we will demonstrate that the RWM can be considerably improved 
upon by using the preconditioned Crank-Nicolson (pCN) , which is a well-defined 
algorithm on "H, and corresponds to 



Qix,dy) = C{{l-2d)^x + V26^) (1.5) 
a{x,y) = 1 A exp(<i>(a;) - $(y)) (1.6) 

with ^ ~ 7. The pCN was introduced in [5]. Numerical experiments in [11] 
demonstrate its favorable properties in comparison with the RWM algorithm. 
Notice that, in contrast to RWM, the acceptance probability is well-defined on 
Hilbert space and this fact gives an intuitive explanation for the theoretical 
results we derive in this paper in which we develop a theory which explains the 
superiority of pCN over RWM. Our main positive results about pCN can be 



summarised as (rigorous statement in Theorems 2.14 2.15 4.2 and 4.4 1: 

Claim. Suppose $ and its local Lipschitz constant both satisfy a growth as- 
sumption at infinity. Then the pCN algorithm applied to fi{fim) 

1. has a unique invariant measure ^ {^J■m)', 

2. has a Wasserstein spectral gap uniformly in m; 

(a) has an L^-spectral gap 1 — /3 uniform in m; 

n 

The corresponding sample average S'„(/) — -^J^fi-^i) 

4 satisfies a strong law of large numbers and a central limit theorem (CLT) 
for a class of locally Lipschitz functionals for every initial condition; 
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5 For f G i-^fim) satisfies a CLT for (/im)-almost every initial 
condition with asymptotic variance uniformly bounded in m; 

6 There is an explicit bound on the mean square error (MSE) between S'„(/) 
and for certain initial distributions v. 

These positive results about pCN clearly apply for $ = 0, which corresponds to 
the target measure 7 and 7„j respectively; in this case the acceptance probability 
of pCN is always one, and the theorems mentioned are simply statements about 
a discretely sampled Ornstien-Uhlenbeck (OU) process on H in this case. On 
the other hand the RWM algorithm applied to the target measure 7,„ has an 
spectral that converges to as m ^ cx) as fast as any negative power of m 
see Theorem 12.171 

While it is a major contribution of this article to verify 1,2,4 and the negative 
result for the RWM, 3,5 and 6 are consequences of verifying conditions of known 
results. 

In addition to the significance of the results themselves for the understanding 
of MCMC methods, we would also like to highlight the techniques of proof that 
we use. We use recently developed tools for the study of Markov chains on 
infinite dimensional spaces [22) that, for many problems, improve significantly on 
the machinery that has been used for the study of MCMC methods to date. The 
weak Harris theorem makes a Wasserstein spectral verifiable in practice and for 
reversible Markov processes it even implies an L^-spectral gap. Previous results 
have been formulated in terms of the following three main types of convergence: 

1. For a metric d on the space of measures the convergence rate is given 
as the decay rate of d{vV^T^), where v is the initial distribution of the 
Markov chain. The most prominent examples here are convergence in a 
(weighted) total variation and in a Wasserstein distance. 

2. For the Markov operator V the convergence rate is given as the operator 
norm of on a space of functions from "H to M modulo constants. The 
most prominent example here is the L^-spectral gap. 

3. The (asymptotic) convergence rate of Sn{f) = X]"=i fi-^i) to m(/) for a 
class of functions / in form of a CLT or a MSE bound. 

Between these notions of convergence, there are many fruitful relations, see e.g. 
|46| . All these convergence types have been used to study MCMC algorithms. 

The first systematic approach to prove L^-spectral gaps for Markov chains 
was developed in |3T] using the conductance concept due to Cheeger ([!]). These 
results were extended and applied to the Metropolis-Hastings algorithm with 
uniform proposal and a log-concave target distribution on a bounded convex 
subset of M" in • The consequences of a spectral gap for the ergodic average 
in terms of a CLT and the MSE have been investigated in \26\ [12] and |46] 
respectively and were first brought up in the MCMC literature in [T51I5]. 
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For finite state Markov chains the spectral gap can be bounded in terms of 
quantities associated with its graph |TS] and this idea has also been applied to 
the Metropolis- Algorithm in [48] and |17| . 

A different approach using the now called splitting chain technique was in- 
dependently developed in and [5] to bound the total variation distance 
between the n-step kernel and the invariant measure. Small and petite sets are 
used in order to split the trajectory of a Markov chain into independent blocks. 
This theory was fully developed in [37] and again adapted and applied to the 
Metropolis-Hastings algorithm in |44| resulting in a criterion for geometric er- 
godicity 

\\V{x, •)" - mIItv ^ C{x)d' for some c < 1. 

Moreover, they also established a criterion for a CLT. Extending this method, 
it was also possible to derive rigorous confidence intervals in [29] . 

In most infinite dimensional settings the splitting chain method cannot be 
applied since measures tend to be mutually singular. The method is hence not 
well-adapted to the high-dimensional setting. Even Gaussian measures with 
the same covariance operator are only equivalent if the difference between their 
means lies in the Cameron-Martin space. As a consequence, the discrete time 
Ornstein-Uhlenbeck process on a function space is not irreducible in the sense 
of [37], i.e. there is no non trivial measure ip such that ip{A) > implies 
'P{x,A) > for all x. By inspecting the Metropolis-Hastings transition kernel 



(1.3 1 the pCN algorithm is not irreducible, since if a; — y is not an element of 
the Cameron-Martin space, each measure in the decomposition for Vix, •) is 
mutually singular to each measure in the same decomposition for V{y, ■). This 
may also be shown to be true for the n-step kernel by expressing it as a sum of 
densities times Gaussian measures and applying the Feldman-Hajek Theorem 

For these reasons, existing theoretical results concerning RWM and pCN in 
high dimensions have been confined to scaling results and derivation of diffusion 
limits. In [4j the RWM with a target that is absolutely continuous with respect 
to a product measure has been analyzed for its dependence on the dimension. 
The proposal distribution is a centered normal random variable with covariance 
matrix Gnln- The main result there is that 5 has to be chosen as a constant times 
a particular negative power of n to prevent the expected acceptance probability 
to go to one or zero. In a similar setup it was recently shown [35^ that there 
is a /x-reversible SPDE limit if the product law is a truncated Karhunen-Loeve 
expansion. This SPDE limit suggests that the number of steps necessary for a 
certain level of accuracy grows like 0(m), because in order to approximate the 
SPDE limit on [0,r] 0{m) steps are necessary. A similar result in \A1^ suggests 
that the pCN algorithm only needs 0(1) steps. 

Uniform contraction in a Wasserstein distance was first applied to MCMC 
in in order to get bound on the variance and bias of the sample average of 
Lipschitz functionals. We use the weak Harris theorem to verify this contraction 
and using the results from [46] non-asymptotic bounds on the sample average 
of L^, functionals. 
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In this paper we substantiate these ideas, by using spectral gaps derived 
by applying the weak Harris theory of Section 2 contains statement of 

our main results, namely Theorems 2.9, 2.11 and 2.13 concerning the desir- 
able dimension-independence properties of the pCN method, and Theorem 2.16 
concerning the undesirable dimension dependence of the RWM method. Sec- 
tion 2 starts by specifying the RWM and pCN algorithms as Markov chains, 
statement of the weak Harris theorem, and a discussion of the relationship be- 
tween exponential convergence in a Wasserstein distance and spectral gaps. 
Proofs of the theorems from Section 2 are given in Section 3. In Section 4 we 
exploit the Wasserstein and spectral gaps in order to derive a law of large 
numbers (LLN), central limit theorems (CLTs) and mean square error (MSE) 
bounds for sample-path ergodic averages of the pCN method, again emphasiz- 
ing dimension-independence of the results. We draw some overall conclusions 
in Section 5. 
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to MH by EPSRC grant EP/D071593/1, by the Royal Society through a Wolfson 
Research Merit Award, and by the Leverhulme Trust through a Philip Lever- 
hulme Prize. AMS is grateful to EPSRC and ERG for financial support. SJV 
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2 Main Results 



In Section |2.1| we specify the RWM and pGN algorithms and in Section |2.2| we 
summarize the weak Harris theorem and present how a Wasserstein spectral gap 



implies an L^-spectral gap. In Section 2.3 we give necessary conditions on the 
target measure for the pGN algorithm to have a dimension independent spectral 
gap in a Wasserstein distance. In Section [2T4| we highlight the downside of the 
RWM by giving an example that satisfies our assumption for the pGN algorithm 
for which the spectral gap of the RWM algorithm converges to zero as fast as 
any negative power of m for m — !■ oo. 



2.1 Algorithms 

We focus on convergence results for the pGN algorithm (Algorithm [T]) that 
generates a Markov chain {X"}„gN with X" g H and {X^}„gN when applied 
to a measure fi and Hm respectively. The corresponding transition Markov 
kernels are called V and Vm respectively. We use the same notation for the 
Markov chain generated by the RWM (Algorithm 2). This should not cause 
confusion as statements concerning the pGN and RWM algorithms occur in 
separate subsections. 
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Algorithm 1 Preconditioned Crank-Nicolson 
Initialize Xq. 
For n > do: 

1. Generate ^ ^ 7 and set px^ (0 = -^n + "^^^ 

2. Set 

X - [P^^ "^^^^ probability a{Xn,pxJ) = 1 A exp($(a;) - $(2/) 
I Xn otherwise 



Algorithm 2 Random Walk Metropolis 



Initialize Xq. 
For n > do: 

1. Generate ^ ^ 7™ and set px„ (0 = -^n + V^^- 

2. Set 



px^ with probability a(X„,pjc^) = 

1 A exp($(a;) - $(y) + l{x,Cx) - \{y,Cy)) 
X„ otherwise 
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2.2 Preliminaries 



Here we introduce Lyapunov functions, Wasserstein distances, d-small sets and 
d-contracting Markov kernels in order to state a weak Harris theorem recently 
proved in [25]. We use this theorem to prove our main results. By weakening 
the notion of a small set, this theorem gives a sufficient condition for exponential 
convergence in a Wasserstein distance. We explain how this in turn implies an 
L^-spectral gap which is a major reason for the importance of the weak Harris 
theorem. 



2.2.1 Weak Harris Theorem 

Definition 2.1. Given a Polish space E, a function d : E x E — > is a 
distance-like function if it is symmetric, lower semi-continuous and d{x, y) — 
is equivalent to x ^ y. 

This induces the 1- Wasserstein "distance" associated with d for measures 



d{vi,V2) = inf / d{x,y)Tr{dx,dy) (2.1) 

7rer(iyi,iy2) JexE 

where r(z^i, 1^2) is the set of couplings of vi and 1^2 (all measures on E x E with 
marginals i^i and 1/2) ■ If is a metric the Monge-Kantorovich duality states 



d{vi,V2)= sup / fdvi- I Jdv2. 

11/11 r„,.>=W J 



We use the same notation for the distance and the associated Wasserstein 
distance; we hope that this does not lead to any confusion. 

Definition 2.2. A Markov kernel V is d-contracting if there is < c < 1 such 
that d{x, y) < 1 implies 

d{r{xr),V{yr))<c-d{x,y). 



Definition 2.3. Let be a Markov operator over a Polish space E endowed 
with a distance-like function d : E x E — > [0, 1]. A set C E is said to be d-small 
if there exists < s < 1 such that for every x,y G S 

d{V{x,-),V{y,-))<s. 

The d- Wasserstein distance associated with d{x, y) — X{x^y} (x, y) coincides 
with the total variation distance. If 5 is a small set (c.f. [37_) there is a 
probability measure v such that V can be decomposed into 

Vix, dz) = sV{x, dz) -t- (1 - s)v{dz) for x G S", 

which implies dTv{'P{x, ■),V{y, •)) < s hence S is d-small, too. 
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Definition 2.4. A Markov kernel V has a Wasserstein spectral gap if there is 
a A > and a C > such that 



d{i^iV,iy2V") < Ccxp{-Xn)d{iyui^2) for all n e N. 

Definition 2.5. is a Lyapunov function for the Markov operator V if there 
exist K > and < ^ < 1 such that 

VVix) < rV{x) + K for all x e E and all n e N. (2.2) 

Remark. Sometimes referred to as a drift condition because it implies that 
E(y(X„+i)) is smaller than V{Xn) if V(X.n) > j^iK. 

Proposition 2.6. (Weak Harris Theorem J2^) Let V be a Markov kernel 
over a Polish space E. Assume that: 



1. V has a Lyapunov function V such that (2.2) holds; 

2. V is d-contracting for a distance-like function d : E x E — > [0, 1]; 

3. the sef 5 = {a; e E : V{;x) < AK} is d-small. 

Then there exists h such that for any i>i, i>2 be probability measures on E we 
have 

where d{x,y) = a/ d{x, y){l + V{x) + Viy)) and h(l, K, c, s) is increasing in I, 
K ,c and s. Moreover, if there exists a complete metric do on E such that 
do < Vd and such that Vt is Feller on E, then there is a unique invariant 
measure fi for Vt ■ 

Remark. For = A* we obtain the convergence rate to the invariant measure. 
2.2.2 Wasserstein implies L^-spectral Gap 

In this section we explain why a Wasserstein spectral gap under mild assumption 
implies an i^- spectral gap. 

Definition 2.7. (L^^-spectral gap) A Markov operator V with invariant measure 
/i has an i^-spectral gap 1 — /? if for Lq = {f E L^ \ fj.{f ) = 0} 

The following proposition is due to F.-Y. Wang and is a discrete-time version 
of Theorem 2.1(2) [21]. It was also rediscovered in [39 . The proof given below 
is from private communication with F.-Y. Wang and is presented because of 
its beauty and the tremendous consequences in combination with weak Harris 
theorem. 
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Proposition 2.8. ( 1451/ Private Communication) Let V be a Markov transition 
operator that is reversible with respect to /i and suppose Lip{S) D n is 
dense in for some C , then 

implies the L^-spectral gap 

WV'^f - M(/)ll2 < 11/ - Kf)\\hM~^ri). (2.3) 

Proof. Let < / G Lip n L°°{^) with ^{f) — 1 and tt be the optimal cou- 
pling between {V'^"f)n and /x for the Wasserstein distance associated with d. 
Reversibility impUes J{Vf)'^dfi = /(P^"/)/^^^ which leads to 

ll^'7-M/)ll2 = t^{{v-ff)-i = j{f{^)-f{v))d^ 

< Lip{f) j S{x,y)d^T<L^p{f)~S{V''^f^,,^i) 

= Ltp{f)6i{ffi)V^",fi) < CLipif) exp(-2An). 

Since the above extends to a • / + 6 for general / G L°° n Lip{6), we note 
that 



\Ptf~Ml < 2\\pj+ - f,if+)\\l + 2\\Ptf- - ^l{f-)\\l. 



By Lemma |2.9[ the bound (2.3 1 holds for functions in Lip (1 L°°{fj,), hence 
the result follows by taking limits of such functions. □ 

Lemma 2.9. Let V be a Markov transition operator that is reversible with 
respect to /x. // for some / e and constants C(/) and A > 



||7'"/-/^(/)|l2<C(/)exp(-An), 



then for all n G iV 



r"/-M/)ll2<ll/-M(/)ll2exp(-An). 

Proof. Without loss of generality we assume nif^) — 1 where f — f ~ /^(/)- 
Applying the spectral theorem to V yields the existence of a unitary map 
U : L'^(ji) i-^ L'^{X^v) such that UPU'^^ is a multiplication operator by m. 
Moreover, = 1 implies that {U f^v is a probability measure such that for 

/c e N 

' iV''fix)fdn = I m{xf'^{Uff{x)dv = / m{xf''''+^'^^ d{U ffv 

< {^j m{xf''+''d{Uffiyj ' < C^Sn? exp(-A2n), 
letting fc — ^ oo yields the required claim. □ 
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2.3 Dimension-Independent Spectral Gaps for RWM 



Using the weak Harris theorem we give necessary conditions on /i (see (1.1)) 
in terms of regularity and growth of $ to have a uniform spectral gap in a 
Wasserstein distance for X" and X!^. We need $ to be at least locally Lipschitz; 
the case where it is globally Lipschitz is more straightforward and is presented 
first. Using the notation p = 1 — (1 — 2(5)5 we can express the proposal of the 
pCN algorithm as 

p^"(e) = (l-p)^" + ^^e 

The mean of the proposal (1 — p)^" suggests that we can prove that /(||-||) 
is a Lyapunov function for certain / and that V is d-contracting (for a suitable 
metric) if we have a lower bound on the probability of Xn+i being in a ball 
around the mean. In fact, our assumptions are stronger since we assume a 
uniform lower bound on ¥{px is acceptedjp^: = z) for z in Br(^\\x\\) ((1 — p)x). 

Assumption 2.10. There is R > and a function r : M+ M+ satisfying 
ris) < f s for all \s\ > R such that for x € S_r(0)'= 

inf -^{z) + ^{x)> ai. (2.4) 

z6Br(||x||)((l-p)2;) 



Assumption 2.11. Let <i> in {1.1) have global Lipschitz constant L and assume 
that exp(— <I>) is "f-integrable. 



Theorem 2.12. Let Assumption 2.10 and 2.11\ he satisfied with either 



1. r(||x||) = r 11x11" where r G for any a G (5,1) then we consider V = 

with « G N or V — exp(w ||a;||), or 

2. r{\\x\\) ^r e R for r eM.+ then we take V = ||x|p with i G N. 

Then p, (pL„i) is the unique invariant measure for the Markov chain associated 
with the pCN algorithm applied to fx (pm)- Moreover, define 

d{x, y) = ^d{x,y){l + V{x) + V{y)) with 

\\x — v\\ 
d{x,y) = IaLJ^. 

Then for e small enough there is an h such that for all vi, vi probability measures 
on % and on PmH respectively and for all m G N 

rf>i7'",;^27'") < ld{iyi,i^2), 



Proof. The conditions of weak Harris theorem (Proposition 2.6 1 are satisfied by 



Lemmas |3.3| |3.4| and |3.5| and the uniqueness follows by Proposition |3.9| □ 
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A key step in the proof is to verify the d-contraction. In order to get an 
upper bound on d{V{x, ■),'P{y, ■)) (see pi) ) we choose a particular couphng 
between the algorithm started at x and y and distinguish between the cases 
when both proposals are accepted, both are rejected and only one is accepted. 
The case when only one of them accepts is the most difficult to tackle. By 
choosing d — I A with e small, it turns out that enough the Lipschitz 

constant of a{x, y) can be brought under control. 

By changing the distance function d we can also handle the locally Lipschitz 
case provided that the local Lipschitz constant does not grow too fast. 

Assumption 2.13. Let exp(— $) he integrable with respect to 7 and assume 
that for any k > there is an such that 

|$(x)-<l>(y)| 



(r) 



sup 



2.10 



and 2.13 he satisfied withr{\\x\\) 



Theorem 2.14. Let Assumption , 
with r e M, a € (i, 1) and either V = \\x\\^ with i G N or V = exp(w ||a;||). 

Then /i (^.m) is the unique invariant measure for the Markov chain associated 
with the pCN algorithm applied to fi (^m). 

For k{T,x,y) := {V € C\%T],7i)M^) = x,^{T) = y,\m = 1}, d as 
above with 



d{x,y) = lA inf 

T.,ipeh(T., 



1 



cxp(?7||V'||)di 



and rj ande small enough there is an n such that for all Vi, 1/2 probability measures 
on T-L and on PmH respectively and m G N 



< 



< 



\d{vi,V2) 
\d{vi,V2) 



Remark. A Wasserstein spectral gap for the n-step transition kernel and an 
estimate of the form 

d[V{x,-),V{y,-))<Cd{x,y) (2.5) 

implies a spectral gap for the one-step Kernel. Using that F is a Lya punov 
function and V is d contracting a straightforward calculation shows (2.5 1. 



Proof. This time Lemmas |3.3| |3.7| and |3.8| verify the conditions of the weak 



Harris theorem (Proposition 2.6 1 and Proposition 3.9 yields again the unique- 
ness. □ 



2.10 



degen- 



Remark. Our arguments work for S G (0, |]; for S = ^ Assumption 
erates to 

sup ^{h) 
hen 

In this case V{x,-) and 'P{y,-) are not mutually singular any more and the 
theory of Meyn and Tweedie [37] applies. 



inf $(/i) < cx). 

hev. 
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In order to get the same lower bound for the L^-spectral gap we just have 
to verify that Lip{6) n n is dense in L^^. 



Theorem 2.15. If the conditions of Theorem \2.12 

-2 



2.14 are satisfied, then 



we have the same lower hound on the Lf^-spectral gap of V and Vm uniformly 
in m. 



Proof. By Proposition 2.8 we only have to show that Lip{d) L°°(/i) is dense 
in L^{H,B,fi). 



By Lemma 



4.1 



and 



4.3 



i*p(IMI) ^ Lip{d) hence it is enough to show that 
Lip{\\-\\)nL°° {^) is dense in L'^{H, B, /i). Suppose not then there is 7^ ,g G L'^ifJ') 
such that 



J fgd^i = for all f e LipH L°^{p). 



Since all measures on a separable Banach space equipped with the Borel a- 
algreba are characterised by their characteristic functional (Bochner's theorem 
6-g- [2]), in particular they are characterised by bounded Lipschitz functions 
with respect to to ||-||. Hence gdpL is the zero measure so that g = in L^. □ 



2.4 Dimension-Dependent Spectral Gaps for RWM 

In order to prove negative results on the spectral gap is suffices to consider a par- 
ticular case, and the analysis is made relatively straightforward by considering 
the case $ = so that our target measure is p,rm and by choosing a particu- 
lar covariance operator. In this setting Theorem |2.15| shows that pCN has an 
m— independent spectral gap; in contrast we will now show that the spectral 
gap for RWM degenerates as m grows, on a specific example. We consider the 
family of measures p^a on the scale of Hilbert spaces and then into (2.6). So far 
we have shown convergence results for the pCN, so subsequently we present an 
example where these results apply but the spectral gap of the RWM goes to 
as m tends to infinity. We consider the target measures /i on 



with < cr < ^ given by 




^J^m=lm = C\y] -i,e, ) C AA(0, 1). (2.6) 



In the setting of ( |1.1| this corresponds to $ = 0. Hence the assumptions of 



Theorem 



2.14 



are satisfied and we obtain a uniform lower bound on the L?- 



spectral gap for the pCN. For the RWM algorithm we show that the spectral 
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gap converges to zero faster than any negative power of m if we scale S = s m~°' 
for any a G [0, 1). 

Using the notion of conductance 

C= inf lAn^'f2M^\ (2.7) 
we obtain an upper bound on the spectral gap by Cheeger's inequality (c.f. 

ED HE]) 

Y < 1 - /3 < 2C. (2.8) 

For the Metropolis-Hastings algorithm we can use a{x) — J a{x,y)Q{x,dy) 
to bound C. 

Proposition 2.16. Let V be a Metropolis-Hastings transition kernel for a target 
measure ^ with acceptance probability a{x,y). For any set B with ^i{B) < |, 
the spectral gap can be bounded by 

1 — (3 < 2 sup a{x). 

xeB 

Proof. The algorithm can only move from B to i?^ if it accepts the move. Hence 

V{x,B'') < a{x). 

Since this yields the bound 

C = mt —■ < -— < supa(a;), 

the claim follows from Cheeger's inequality. □ 
Theorem 2.17. Let Vm be the Markov kernel and a be the acceptance proba- 



bility associated with the RWM algorithm applied to fim as in (2.6) 



L For 5m ^ rn , a ^ [0,1) and any p there is a K{p,a) such that the 
spectral gap of Vm satisfies 

l-l3m<K{p,a)mrP. 

2. For dm ^ m^", a G [l,oo) there is a K(a) such that the spectral gap of 
Vm satisfies 

l-/3m < K{a)m-^. 

Proof. For the first part we work on the space H„ with a € [0, ^) and a is 
determined later. We choose i3r(0) such that ii{Br{0)) < \ and by (3.1) we 



know that (-S™(0)) is decreasing towards ii{Br{0)). Hence for all m l arger 
than some M we know that /i (i?™(0)) < |. In order to apply Proposition 2.16 
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we have to get an upper bound on a{x) on _B™(0). Thus we use uAv < u'^v^ ^ 
to bound 



aix,y) = lAcxp i'(y? - a;^) j < cxp (^-lf2i\y^~x^)Xj . 

Using this inequality, we can find an upper bound on the acceptance probabiHty 



a{x) = / a{x,y)Q{x,dy) < 



{yf~xl)X + 



26 



Completing the square and using the normalisation constant yields 



dy. 



a{x) < 



^ 771 



2S\ + 1 



{2SX + 1) 



dy 



< (l + 2A5)-^exp I ^ 



6X^ 



■2 2 

-I X, 



- (2^ + 1) 

For X S i?™(0) in Ha, using S = mT'^ and setting A = m^'' 

/ 2-2cr-a-26\ 

a{x) < (1 + 2to-(''+''))-t'cxp f j. 

In order to get decay from the first factor we need a + h < \ and to prevent 
growth from the second a + 2h > 2 — a which corresponds to a + 26 > 1 for a 
sufficiently close to | . This can be satisfied with h = '^('^-"■) g^jj^j u = Mii <; i ^ 
In this case the first factor decays faster than any negative power of m since 

(1 + 2m-("+''')-T = exp (-^ log(l + 2m-(°+''))) < exp(-Cmi-(''+'')). 

For the second part of the poof we use a{x, y) < 1 and A ~ {x ^ W\xi > 0}, 
which by symmetry satisfies 7m(A) = to bound the conductance 



C 

2 



d A Ac \ ^ — 1 



- yif /{25) dxdy 



< 



A A 
2 25 



exp(- 



J -co 
oo p — 



27rV25 



-dyi exp ( -^^1 I dxi 



^ exp(-iz2) , . 2 1 , 

dyi exp ~-x^ dxi. 



27T 



Combining Fernique's theorem and Markov's inequality (Lemma A. 2 1 yields 



C<K I cxp{-l-{^^^)xl)dx < Kd2TT 



S + 1 



< Km 2 ^ 



so that the claim follows again from Cheeger's inequality. 



□ 
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3 Spectral Gap: Proofs 



We check the three conditions of the weak Harris theorem (Proposition |2. 6 1 for 
globally and locally Lipschitz $ (see ( |1.1| ) in Sections 3.1 and 3.2 respectively. 
For each condition we use the following lemma for the dependence of constants 
I, K,c and s in the weak Harris theorem on m. This allows us to conclude that 
there is h{m) < h such that 



1 -7, 



for all measures vi, measures on H and PmH respectively. 

Replacing r(s) A |s only weakens the condition (2.4 1 so we can and will 
assume that r(s) < ps/2. 



-> M 6e monotone increasing, then 



Lemma 3.1. Let f 



and in particular 

7™(Sfl,(0)) > liBniO)). (3.1) 
Proof. The truncated Karhunen-Loeve expansion relates 7^ and 7 and yields 

7n 00 
i=l 1=1 

Hence the result follows by monotonicity of the integral and / 



/(Iieii)rf7™(e)-E( 



i=l 



\ 



/(EA.£i!))= / /(K||)<i7({) 



1 = 1 



This yields (3.11 by inserting / = XBr(o)-- 



□ 



We conclude this section by showing fi respectively fim are the unique in- 
variant measure for V respectively Vm- 



3.1 Global log-Lipschitz density 

In this section we will prove Theorem |2.12| by checking the three conditions of 
the weak Harris theorem for 

d(x,y) = lA^^J. (3.2) 
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3.1.1 Lyapunov Functions 

Under Assumption |2.10| we show the existence of a Lyapunov function V. This 
reUes on the decay of V on ((1 — p)x) and the fact that probabiUty of 

the next step of the algorithm lying in that ball can be bounded below by the 
Fernique's theorem which we recall here 

Proposition 3.2. (Fernique's theorem see e.g. [73 [WH/ ) Let 7 = J\f{m,C) 
be a Gaussian measure on a Banach space, then for f3 small enough 

exp(/3 ||u||^)c?7(m) — Fj^ < 00. 



X 



Moreover, to deal with a proposals outside Br[\\x\\) ((1 — p)x) we use 



Proposition. A.l (Appendix) For small enough /3 and a G M there is a constant 
CoL.p such that 

I exp(« \\u\\)dj{u) < C„,^e-^^'+"^. 

Lemma 3.3. Suppose Assumption \2. ld\ is satisfied with either: 

1. r(||.T||) = r e M; or 

2. r{\\x\\) = r||a;||"), k > Q and a e 

Then the function V{x) = with i ^ N in the first case and additionally 

V{x) — exp(^||x||) in the second case, are Lyapunov functions for both V and 
Vru) with constants I and K uniform in m. 

Proof. In both cases we choose R as in Assumption |2. 1 0| set 

sup VV{x)< sup [ (\\x\\+^/25\\^\\)' d-yiS) < R' + C ^: Ki <oo 

by Fernique's theorem. Now let x e Br(^Y . then there is < Z < 1 such that 

supy(j/) <lV{x). (3.3) 

yeBr(iix|i)((i-p)2;) 



We denote by A = {w|v2(5||^|| < rdlxjl)} the event that the proposal lies in 
a ball with a lower bound acceptance probability due to Assumption |2.10| to 
bound 

VV < F{A) V{accept\A)lV{x) +V{reject\A)V{x) +E{Vipx) W Vix); A") 

< V{A) \{l-¥{accept\A){l-l)] V{x) +E{V{px) V V{x); A'') 

< eP{A)V{x)+E{V{pa:)yV{x);A'') 
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Figure 1: Contraction 



for some 9 < 1. It remains to consider {px)y V {x); A"^) where the differences 
will arise between cases 1 and 2. For the first case we have by Fernique's theorem 



E(yfe)VF(x);A^) < / \\x\\'w (il-p)\\x\\+V2S\m)^djiO 



< 



i\\> 



Since a ball around the mean of a Gaussian always has positive mass (The- 
orem 3.6.1 in [6J) we note 

VV < V{x){V{A)e + FiA")) + K2<IV + K2. 

For the second case we estimate 

E{V{px)VV{x);A-) < f e"(ll-ll+^ll«ll)d7(0. 

||''7||>r||a;||" 

The right hand side above is uniformly bounded in a; G Bb{QY by some K2 due 
to Proposition |A.1[ Hence in both cases there is an / < 1 such that 

VV{x) < lV{x) +msix{Ki,K2) Vx. 

For the m-dimensional approximation the probability of the event A is larger 
by Lemma 3.1 and ¥(accept\A) has the same lower bound and therefore l{m) is 



smaller than /. Similarly Ki{m) is smaller then Ki by Lemma 3.1 □ 



3.1.2 The d-Contraction 

In this section we show that V is d-contracting for d{x, y) = 1 A ^^^^^^^ by 
bounding d{V{x, ■),'P{y, ■)) (see (2.1 1) with a particular coupling. For x and y 
we choose the same noise ^ giving rise to the proposals Px{£,) and Py{^) and the 
same uniform random variable for acceptance. Subsequently we will refer to 
this coupling as the basic coupling and bound the expectation of d under this 
coupling by inspecting the following cases: 



18 



1. The proposals for the algorithm started at x and y are both accepted. 



2. Both proposals are rejected. 

3. One of the proposals is accepted and the other rejected. 



Lemma 3.4. //$ in (1.1) satisfies Assumption 2.10 and 2.11 then V and V„ 
are d- contracting for d as in (3.2) with a contraction constant uniform in m. 



Proof. By Definition 2.2 we only need to consider x and y such that d{x, y) < 1, 
which implies \\x — y\\ < e. Later we will choose e <C 1 hence if ||a; — y\\ < e 
then either x,y G -B_r(0) ot x,y G -B^(O) with i? = i? — 1, and we will treat the 
two cases separately. We assume without loss of generality \\y\\ > \\x\\. 
For x,y € -B_r(0) and A = {uj\\^ \\^\\ < R} the basic coupling yields 



d{V{x,-),V{y,-)) < P(A) [P(both accept|yl)(l - p)d(.T,y)+ 
P (both reject|A)d(a;, y))] + ¥{A^)d{x, y) 



(3.4) 



|a(a;,Px)(0 - oi{y,Py){Cj\d-i{C) 



where the last term bounds the case that only one of the proposals is accepted. 
Using the bound P(both rejectjA) < 1 — P(both acceptjA) yields a non-trivial 
convex combination of d and (1 — p)d, since the probability P(both acceptjA) 
is bounded below by exp(- sup{$(z)| < 2R) + inf{$(z)| ||z|| < 2R)) due 



to (1.5 1. The first two summands in (13. 41) form again a non-trivial convex 



combination, since P(A) > 0, so that there is c < 1 with 

d{V{x,-),V{y,-))<~cd{x,y)+ [ |a(a;,p,)(0 - «(y,Py)(C)l ^7(0- 

Note that c is independent of e. For the last term we use that 1 A exp(-) has 
Lipschitz constant 1 



X 



{x,p^){0 - a{y,py){0\ d-fiO < J Mp.) - <f{Py)\ + Mx) ~ <f{y)\ d-f{0 

< 2L \x -y\< 2Led{x,y) 



which yields an overall contraction for e small enough. 

Similarly we get for x,y € B^{OY and B = {w|\/2(5||C|| < r(||a;|| A \\y\\)} 

d{r{x,-),r{y,-)) < P(S)(P(both accept | B) (1 - p) + P(both reject|B))(i(a;, 



')d{x,y) 



4x,Px){0 - a{y,Py){0\d-f{£,)- 



The lower bound for P(both accept|_B) is this time due to Assumption 2.10 



All occurring ball probabilities are larger in the m-dimensional approxima- 
tion due to Lemma [34] and the acceptance probability is larger since inf and sup 
are applied to smaller sets, thus the contraction constant is uniform in m. □ 
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3.1.3 The d-Smallness 

The d-smallness of the level sets of V is achieved by replacing the Markov kernel 
by the n-step one. This preserves the c?-contraction and the Lyapunov function. 
The variable n is chosen large enough so that if the algorithms started at x and 
y both accept n times in a row d drops below ^ , hence 



(accept n-times). 



Remark. It is necessary to replace the one step Markov kernel with the n-step 
which can be seen by considering the Wiener measure on (C([0, 1]), || -H^) and 
$ = (our theory also applies to Banach spaces see Section ([s])). For the 
constant zero path 'ip and 



nx X < 1/n 
X > 1/n 



1 



1 but the transition to a common e neighborhood using the pro- 



posal (1.51 converges to zero as n -> cxi. 



Lemma 3.5. // S is bounded, then there is an n and < s < 1 such that for 



all x,y & S , m £ N and for d as in {3.2) 

d{rU^,-),V:^iy,-))<s and d{r"{x,-),V"{y,-)) < s . 

Proof. In order to get an upper bound for d {V^ix, ■),'P"{y, ■)) we choose the 
basic coupling (see Section 3.1.21 as before. Let Rs be such that S C Bif^{0) 
and B be the event, that both instances of the algorithm accept n times in a 



row. In the event of B we have using (3.2) 



d{X,,,Y^) < - ||X„ - YJ < -(1 - pr \\Xo - YoW < ^ (1 - p)"diam 5 < J 
e e e 2 

which implies that if Xo and Yq are in S then d{Xn, Yn) < ^, hence 

1 



d(7'"(x,.),7'"(y,-)) < 



+ (1 -P(S)) • l< 1 



We write for the noise in the i-th step and bound 



¥{B) > 



2SC 



R 

< — t = 1 . . .n 

n 



both accept n times | II^HI < 



R. 



R 



> P(||CII<-)"exp - sup $(z)+ inf $(z) > 0, 

uniformly for all Xq,Yq G Bj^{0). For the m-dimensional approximation the 
lower bound exceeds that in the infinite dimensional case due to Lemma 13.11 
and the fact that 

- sup <i>(z) + inf <i>(z) < - sup ^{Pnz) + inf $(P„2:) 

z€B2r(Q) zeB2R{0) zeB2R{0) zeB2RiO) 
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so that the claim follows. □ 

3.2 Local log-Lipschitz density 

Now we allow the local Lipschitz constant 

- *(y)| 

0(r) = sup ^ 

x^yeB,.{0} ~ y\\ 

to grow in r. In order to deal with the situation where only one proposal is 
accepted, in proving V is d-contracting. we choose c? in a way such that two 
points far out have to be closer in ||-||^ in order to be considered "close" i.e. 
d{x,y) < 1. This is inspired by constructions in | I21L I22| . Setting 

A(r,x,2/) {V e Ci([0,T],H),^(0) = x,^(r) = y, ||^|| = 1}, 
we define metrics d and d by 



d(a;,2/) = 1 Arf(a;,y) d{x,y) = ^ ^^iid^ ^ e^p{Tj \\'ip\\)dt, (3.5) 

where e and 77 i s chosen along the way depending on $ and 7. The situation 
is different from before because even in the case "both accept" the distance can 
increase because of the weight. In order to control this we note 



Lemma 3.6. Let tp be a path connecting x, y then for d as in (3.5) 

1. \ /J'exp(?7 II V' II )ci< < 1 implies T < J := eexp (-?7(||a;|| V ||y|| - e) V 0) < 
e. 

2. d{x, y) < fc^^ exp (77(||a;|| V \\y\\)) and for d < I 

- y\\ 



■ exp (77(11x11 V ||tj||- J) VO)<d(x,y) 



3. For d < 1 we have 



d{x,y) 

Proof. For the first statement, observe that 
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For the second part we set ip to be the hne connecting x and y to get the upper 
bound and for the lower bound we use HV'II > (||a:|| V \\y\\ — J) V from the first 
part combined with the fact that T < e. Using 2. we get 

d{p,,py) < \l-26)i ||x-y||e"[(ll^^ll^ll^ll)-''(ll-II^II^IIH^II«ll] 



< (1 - 2(5)3e''["'''ll^ll^ll^ll'+^ll^ll+"'] - ||.T - y\\ e''(ll^ll^ll3'"--') 
~ e 

< (l-2(5)3e''Mll^ll^ll^ll)+^ll«ll+'^]d>,y), 



which is precisely the required bound. □ 
3.2.1 Lyapunov Functions 

This condition neither depends on the distance function d nor on the Lipschitz 



properties of $ hence Lemma 3.3 applies 



3.2.2 The d- Contraction 

Lemma 3.7. // $ satisfies Assumption \2.1(\ and \2.1^ then V and Vm are d- 



contracting for d as in (3.5) with a contraction constant uniform in m. 



Proof. First suppose x,y £ Bri{0) with d{x,y) < 1 and denote the event A = 
l"^! M\\ < ^7^}' ^'^^^ ^^^^ choose R large then rj small and at last e small. 
We have 

d{V{x,-),V{y,-)) < P(A)[P(both accept] A) (3.6) 
+ [P(both reject|A)d(x,?/)] 
+E((Q;(x,Pa;) A a{y,py))d{px,Py)]A'') 
+E((1 - a{x,p,) V a{y,Py))d{x,y)-A^) (3.7) 
+P(only one accepts) • 1 

where the first two lines deal with both accept and both reject in the case of A, 
the third and fourth line considers the same case in the event of A'^ . The last 



line takes care of only one accepts. For the first two lines of (3.6 1 we argue that 



P(both accept|A) > inf P(accepts|p:r = 2) = exp(-$+(3i?) + $"(3i?)). 

x.zeB3a(0) 



If both are accepted we know from Lemma |3.6| that 

< (l-2^)^exp(-r;p(||:r||V||2;||)+r;( +J)) 
d{x,y) V J 

< (l-2(5)5e''(3«+-^) < (i-p) 
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where the last step follows for small enough 77. Using the complementary prob- 
ability we can estimate 



P(both reject|A) < 1 - P(both accept]^) 

Combining both estimates we get P{A) (1 — P(both accept|yl)(l — p)) as coeffi- 
cient in front of d{x,y). In order to show contraction we have to show that the 
expression in the third and fourth line of ( |3.6[ ) is close to P(^^) • d{x, y) . We 
note that 

E ((1 - a{x,p^) V a(y,py))d{x, y);A'') + E {{a{x,p.j:) A a{y,Py))d{p^,Py); A'') 

< E {d{p,,py) V d{x, y); A^) < d{x, 2/)E%^ V 1 

d{x,y) 

< d{x, y) j IV e''(^ll«ll+''')d7(0 

V28\\^\\>2R 



where the last step followed by Lemma 3.6 For small r] the above is arbi- 
trarily close to P(A'^) • d{x,y) by the dominated convergence theorem. By 

writing the integrand as xy25||4||>2/? ( ^ ^ '^^pC^C'^/^ ll'^ll + ) ^'^^ a-pplying 



Lemma |3.1| we conclude that this holds uniformly in to. Combing the first 
four lines, the coefficient in front of d{x, y) is less than 1 independently of e. 
Only P(only one accepts) • 1 is left to bound in terms of d{x, y) : 



(only one accepts) = J \a{x,px) - a{y,py)\d'y{^) 

< J mp,) - ^py)\ + Mx) - ^y)\)dj{0 

< ed{x, y) f im -P)R + V2S U\\) + HR))dliO 



The integral above is bounded by Fernique's theorem, hence for e small enough 
combining with the result above we get an overall contraction. 

Now let x,y E ^|.(0) with d{x,y) < 1 and without loss of generality \\y\\ > 
\\x\\. Analogous to the above with A = {cjIUV^CII ^ ^(ll^^ll)} we have 

d{V{x,-),V{y,-)) < P(^) [P(both accept |^)(1 - p)d(x,2/)-|- 

P(both reject|A)d(a;, y)] + E {d{x, y) V d{p^,Py)] A") 
-|-P(only one accepts) • 1 

If "both accept" in the event of A the contraction constant is smaller than (1 — p) 
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since < | and using Lemma 3.6 For the next term it yields 



E{d{p,,Py)yd{x,y);A^) < d{x,y)E^^^yl 

d[x,y) 



< d{x,y) / l Ve-''"("''")+''(|l^«ll+'^)d7(e). 



A" 



We denote the integral above by /, its integrand by /(C) and F > then 



I<h+h= J fiOdliO + J fiOdliO 

p(.\\y\\--') + F>\\V2Si\\>r{\\x\\A\\y\\) \\V2S^\\>p{\\y\\-J)+F 

for the first part we have the upper bound V{A'^)e^^^'^ . For the second part we 
take g e X* with \\g\\ — 1 and note that {x\g{x) > R} C Bij{OY which yields 

liBniOr) > li{x\gix) > R}) > cxp(-^i?2 + () 

using the one dimensional lower bound. For the uniformity in m we choose 
g = e\. We incorporate all occurring constants into C and use Proposition A.l 
to bound 

h < P(^=)exp(^^^M!-p77(||y||- J) 

vV2Sip{\\y\\ -J)+F)~ fiV2Sipi\\y\\ - J) + Ff + c) . 

For any t > first we choose F large enough and then ry small enough so that 
/ < (1 + T)¥{A'^)d{x,y). Again the estimates above are independent of e which 
we choose small in order to bound P(only one acceptsjA"^) in terms of d{x,y). 
We calculate as above 



J \aix,p^) - a{y,py)\dj{^) 

< j Mx) ~ $(y)| + - ^py)\ djiO 

< f my\\) + 4>{\\p.\\y\\py\\)d7{0\\x-y\\ 



< I^M.e'^ll^ll + J m - p) \\y\\ + m\)dl{0 \ \\x ~ y 



where the last step follows using the upper bound for ||a; — y\\ from Lemma 3.6 
Choosing k — ^ and e small enough, we can guarantee a uniform contraction. 
Checking line by line, the same is true for the m-dimensional approximation. □ 
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3.2.3 The d-Smallness 

Similarly to the globally Lipschitz case we have 

Lemma 3.8. // S is bounded, then 3n G N and < s < 1 such that for all 
x,y £ S, m G N and for d as in (3.5) 

d{V:,\{x,-),V:^{y,-))<s and d{V"{x,-),V^^{y,-)) < s . 



Proof. By Lemma 3.5 d and || • || are comparable on bounded sets. If Xo,Yo S 
i?i{(0) and both algorithms accept n proposals in a row that all lay in i?2_R(0), 

d(X„,r„) < + '^» diam(^)(l - 2Sr/^ < 

Hence the result follows analog to Lemma [3. 5| □ 



3.3 Uniqueness of the Invariant Measure 

Proposition 3.9. If the conditions of one of Theorem \2.1^ or \2.14\ are satisfied, 
then II and /^„i are the unique invariant measures for V and Vm respectively. 

Proof. The space ("H , ) with ^ lA\\x — y\\ < d is complete because , 1 1 • 1 1 ) 
is complete and convergence in both spaces is equivalent. Using the dominated 
convergence theorem for 

V(j){x) = J a.^.p^(j){Px)d^{C) + y (1 - aj;.pjd7(^), 

the Markov kernel V is Feller. The result is now a direct consequence of the 
second part of the weak Harris theorem. □ 



4 Results Concerning the Sample-Path Average 

In this section we focus on sample path properties of the pCN algorithm. We 
prove a strong law of large numbers, a CLT and bound on the MSE. This allows 
us to quantify the approximation of by 

1 " 

Sn.noif) — ~y^/(^-t+rto) 

. We present the results that are consequences of the Wasserstein and the L^- 
spectral gap in Section [4T] and [4?2| respectively. 



4.1 Consequences of the Wasserstein Spectral Gap 

In this section we show the consequences of the Wasserstein spectral gap on the 
sample average, n Compared to the results from the L^-spectral gap since they 
apply to a smaller class of observables, but they hold for the algorithm started at 
any deterministic point. Moreover, similar results also apply to non-reversible 
Markov processes that have a Wasserstein spectral gap. 
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4.1.1 Proper Metric and Lipschitz Functionals 

For the CLT below we need a Wasserstein spectral gap with respect to a metric, 
since the Monge-Kantorovich duality is used for its proof P7| . The distance 

d=J{l + \\xr + \\y\r)A inf - r eMv\m)dt{l + \\xr + \\yr) 
y T>eA(T,x,j/) e Jo 

does not necessarily satisfy the triangle inequality. Therefore we introduce 
d' = W(l + ||x|r + ||2/ir)A inf ^ r eMv\m){l + U\ndt. (4.1) 

y T,i)eA{T,x,y)e Jo 

and show that d < d' < Cd, thus exponential convergence transfers from d to 
d'. 

Lemma 4.1. For the distance-like Junction d and metric d' as above there is a 
C such that 

d' <d< Cd'. 

Proof. Subsequently we assume without loss of generality that \\y\\ > \\x\\. For 
any path ip ^ A we denote 

nT 



Fw = - / eMri\m)a+\\mdt 

e Jo 



by reflecting all points ipit) in i?||j,||(0)^ at 9i3||y||(0) we make F{4>) smaller, 
hence we only have to consider tp that satisfy 

ll^WII < ||y||, te[0,T] (4.2) 

. The first part follows due to 1 + \\^\\' < 1 + ||a;||' + ||y|p. 

For the second part we will use that only have to consider x and y such that 

inf - reMvm)dt<{l + \\xr + \\y\n. (4.3) 

since the minimum expression in d and d' have (1 + ||a;||' + ||?/||*) in common. 

We will first use this to show that x and y have to be close, if they are far out 
we will show that any path close to the infimum has to satisfy \\y\\ > i]:> 

hence 1 + HV'!!* (1 + ||a;||* + ||j/|P) are comparable. On fixed bounded sets d' 
and d are comparable. In order to get a lower on F{tp) we distinguish between 
ip intersects or does not intersect ^^^(O). If the path lies completely outside the 
ball we have 

F{i!) > - ||a; - 2/11 exp(77i?)(l + R') 
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if ip and Bi^{Q) have an intersection then ip is longer than the shortest path 
to Br{0) 



1 



\\y\\-R 



expirjiWyW - t))il + i\\y\\ - ty)dt 
> {\\y\\ - R) exp{7j{\\y\\ - R)){1 + {\\y\\ - Ry)dt 
We choose R — ^ and note that ^ > . which yields in both cases 



FW>^jx-y\\eMv\\y\\/m + {\\y\\/^y 



By (4.3 1 this implies 



(4.4) 



For X and y in Bq{0) we have that d < (2(5* + l)2rf' because of (4.2 1. It is 
only left consider x,y e Bq{OY for Q = Q - 4e exp(-77^)2'+i since (4.4 1 holds. 
Subsequently we will show that for Q and hence Q large enough it is sufficient to 
consider paths ■0 that do not intersect -6^(0) for R — Suppose the shortest 
the path would intersect 5^(0) then the functional is larger than the shortest 
path to the boundary of the ball, hence 



:"(ll'^ll-*)(l + (||y||-tr)dt 

n 

cxp(77|b||)(ry-i(l + ||2/|r)+^ry-i 

n 

exp(77i?)(77-i(l + ff) + ^7y-i 



7^\\y\rn 



R' 



(4.5) 



by z + 1 integration by parts. Let / be the line connecting x and y, then 
using (4.5) yields 

F{1) <\\\x-y\\ e/'ll^ll (1 + llylD < 4exp(7yM)2*+i(i + \\y\\^). 

For Q and in turn Q large enough we have F{ip) > F{1) by plugging R = ^ 
into (|4.5|. Hence for all t e [0,i] ||y|| > tp> ||y|| /2 and therefore 



2^+^(1 + ||V'll')>(l + lkll' + bll') 
which yields that max(2L% 2*+i)fi' > d. 



□ 
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4.1.2 Strong Law of Large Numbers 

In this section we will prove a strong law of large numbers for Lipschitz functions. 
Since ^ (fJ-m) is the unique invariant measures for V (T'm)i 1^ (Mm) is ergodic 
and Birkhoff 's ergodic theorem applies. Hence we only have to extend Birkhoff 's 
theorem from almost every to every initial condition to get a strong law of large 
numbers. 



Theorem 4.2. In the setting of Theorem \2.1^ or \2.14\ suppose supp ji = T-L 
and h : H ^ M. has Lipschitz constant L with respect to d, then for arbitrary 
XoeH 

1 " 

Proof. By Birkhoff 's ergodic theorem we know that this is true for measurable 
h and a.e. initial condition. Because fi has full support for any i > we can 
choose Yq such d{Xo,YQ) < t^ and Birkhoff 's theorem applies to Yq. Hence 



n 



< 



< 



^ n 1 ^ 

-J2h{Y')-E,h + - Y^{h{X^)-h{Y^)) 
n ^-^ n ^-^ 

i=l i=l 
n 1 ^ 

- y h{Y') - E./i + - y LdiX\ Y'). 

Tl ' ^ 71 ' ^ 



i=l 1=1 

By the Wasserstein spectral gap we can couple Xn and y„ such that 

Ed{X",Y") < Cr'''d{X°,Y°) 
for some < r < 1. We then apply Markov's inequality to get 



> c < c 

Since Birkhoff's theorem applies to the Markov process started at Yq we have 
limsup ^ J2 - E^h) > = P ^limsup ^ ^ \h{X') - h{Y') \ > cj 



< C 



L 



c(l-r) 



d{X",Y") 



Setting c—j- yields 



lim sup 



1 " \ 

- h{X' - Ef,h) < t 



>l-t- 



C 
1~' 
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Using the fact that for Ai ^ A2 . 
result follows 



lim P(A„) = ¥{A) with A ^ f] A, the 

n-)-oo 



( 1 " 

lim -^h{X' 

\ ri-i-oo n ^ — ' 



□ 



The above Theorem applies to a large class of fimctionals by the following 
sufficient criterion for d Lipschitzness. 

Lemma 4.3. If f : H ^ M. satisfies for all i? e E+ 

sup ~ -^^^^^ < Ce^'^for k < t] and sup /(x) < C(l + i?*) 



for all R (^R , then f is Lipschitz with respect to d. 

Proof. Subsequently we a ssum e without loss of generality that ||y|| > From 
the arguments in Lemma 4.1 we know that for ||a; — y\\ > 4eexp(— 77-W.)2*+i we 

ill' + \\y\\^ , hence 

\fix)-fiy)\<\f{x)\ + \f{y)\<Cd' 



have d > \ 1 



and we only have to consider x and y that are very close. Consider a; and y such 
that llx — y\\ < 4eexp(— 7y^)2*+-'^ then we have by arguments similar to those 
in the proof of Lemma |3.6[ 



\f{x)^f{y)\ < \\f\\uJB\ 



x\\V\\y 



io))\\^-y\\ 



<r llfll (R eexp(-77(||x||V||;/||-£)VO) 
< ll/llLip [B\\x\\v\\y\\(0)) ^ „ , . d{x,y), 



H-((||x||V||y||-e)VO) 
where the coefficient in front of d is bounded by assumption. 

4.1.3 Central Limit Theorem 



□ 



The result above does not give any rate of convergence. With a CLT on the other 
hand it is possible to derive (asymptotic) confidence intervals and so estimate 
the error for a finite n. We state a CLT that was proved by Komorowski and 
Walczuk in |27 and show that if the conditions of Theorem 2.12 or 2.14 is 



satisfied, then the result of Komorowski and Walczuk applies. This leads to: 



Theorem 4.4. // the conditions of Theorem 2.12 
there exists a €E [0, +00) such that 



2.14 are satisfied, then 



lim -Eiy^ f(Xs)\ =(J^ 
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where f '■= f — fJ-i f) o-nd f is Lipschitz with respect to d' . Moreover, we have 

1 " 

lim J2 /(^^) < - <f-(o, ve e M 

^ i=l 

where $(t(') is the distribution function o/A/'(0,(T^) a zero mean normal law 
whose variance equals . 

Let {E, p) be a Polish metric space and 7-** be the transition probability 
semigroup for the _E-valued Markov process Xt such that 

Assumption 4.5. 

1. The semigroup is Feller i.e V^Cb{E) C Cb{E) and stochastic continuous 
i.e. 

^Hm P*/(x) = /(a;), e E, f e CtiE). 

2. we have ^P* e Vi :— {a\a{E) — Ik J px^{xyda{x) < oo} for any p, £Vi 
and t > 0, 

3. for some xq E E there exist 6 > such that for all i? < oo, and T > 



sup sup I p1YP\x,dy) <oo 



te[0,T] x£Br{xo) . 

4. there exist xq E E and (5 > such that 



A, :-supEp2+«(Xt) <oo 



Xo 

t>0 



5. there exist c, 7 > such that 

di{pP\vP') < ce-'"di{p,v) £ Vi 

Under this assumption their result reads 



Proposition 4.6. l27ISuppose that Assumption 4.5 is satisfied with i — 1,2 and 
po - the law of Xq - belongs to Vi- Then, for any observable ip G Lip[E) the 
following are true: 

1. (the weak law of large numbers) there exist ti^, e M such that 

1 

lim — / %l){Xg)ds = in probability. 

T-i.+oo T Jq 

2. (asymptotic variance) For ip :— ^-){x) — Vi, there is a £ [0, +00) such that 

lim i^{X,)dsf = 
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3. (the CLT) Let he the c.d.f. of a Af{0,a'^) random variable then 



Proof of Theorem \4.4\ Since convergence in is equivalent to convergence 
in d' {H^d') is complete and we will verify Assumption |4.5| for p = d' . The 



first part of the assumption is satisfied since V is Feller (c.f. sectior3.3l and 
stochastic continuity is not needed for time discrete processes. To verify the 
other assumptions we note that p^g (x)* is a Lyapunov function for V using the 



same argument as in Lemma |3.3[ For the second part we note for every finite i 
The fourth assumptions follows because p^^^ is a Lyapunov function such that 

v-pI+' <rpl+\x,) + K 

and we can bound A* < p1'^^{Xo) + K. For third assumption we note 
sup sup Vpl+\x)< sup pI+\x) + -^K. 

i=0...nxeBa{xo} xeBn{xo) ' 

The last part is a consequence of Lemma |4.1| and Theorem |2.14[ 

□ 

4.2 Consequences of L^^-Spectral Gap 

Under the assumption of Theor em (|2.12[ ) or ( |2.14| we have shown the existence 



of an L^-spectral gap in Section 2.2.2 Now we can use all existing consequences 
for the ergodic average with and without burn in (ng = 0): 

1 " 

Sn.noif) — —/ ^ f{^J+no) = S'n.O- 

First of all we recall a general form of the spectral theorem for self-adjoint 
bounded operators (e.g. |42] ) 

Proposition 4.7. Let P be a bounded self-adjoint operator on some Hilbert 
space H. Then exist A, A such that <t{P) C [A, A] and a operator-valued spectral 
measure with support in [A, A] such that 

{P'f, g) = a' {E{da)f, g), f,geH andkeN. 
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Let : [A, A] — > M &e a continuous function. Then one has by the continuous 
functional calculus a self-adjoint operator F{P) with 



{F{P)f.g) 



F{a){E{da)f,g) f,geH, 



and 



\F{P) 



max \F(a) 



In the setting of Proposition 4.7 we have due to the spectral gap [A, A] C 
[—/3,(3]. As a consequence, the following result of [26J yields a CLT. 

Proposition 4.8. (f2^ Statement adapted from 1301). 

Suppose we have a reversible and ergodic Markov chain and a function f €E 
L^- If 

^^"^ {E{dx)fJ} 



1 



[-1,1] 



then for ^ /i the expression ^Jn[Sn — l^{f)) converges weakly to Af{Q, <Jj p). 

In our case a'j -p is bounded by ^^^^^ which yields a uniform lower bound on 
the asymptotic variance in m. The result above has been extended to /i almost 
every initial condition in [12] which also applies to our case. 

A different approach due to [35^ is to consider the MSE 

e.(5„,„„,/) - {¥.,,K \Sn,no{f) - 

Using Tschebyscheff inequality this results in a confidence interval for S{f ). We 
can bound it by using the following proposition from [46J: 

Proposition 4.9. Suppose that we have a Markov chain with Markov operator 
V which has an L^^-spectral gap 1 — /3. For p £ (2, oo] let no(p) be the smallest 
natural number which is greater or equal to 

2(p-2) ^°e,yp-2J 



1 



log(/3-i) I log (64) 



dv 
dfi 



1 



(2,4) 
[4,oo]. 



(4.6) 



Then 



sup ey{Sn,no,f) < 



11/11 <i 

\\ J II n — 



n(l-/3) n2(l-/?) 



2 ■ 



In our setting rio(p) is finite for u = ^ under the additional assumption that 
for all Ml > there is a U2 such that 



Using Fernique's theorem this implies that that 3^ — 1 has moments of all orders. 
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5 Conclusion 



From an applications perspective, the primary thrust of this paper is to develop 
an understanding of MCMC methods in high dimension. Our work has concen- 
trated on identifying the (possibly lack of) dimension dependence of spectral 
gaps for the standard random walk method RWM, and a recently developed 
variant pCN adapted to measures defined via density w.r.t a Gaussian. There 
are also variants of the Metropolis-adjusted Langevin algorithm (MALA) [S], as 
well as Hybrid Monte-Carlo methods |S] adapted to the sampling of measures 
defined via density w.r.t a Gaussian, and it would be interesting to employ the 
weak Harris theory to study these algorithms. Other classes of target mea- 
sure, such as those arising from Besov prior measures |28l I14| . or the uniform 
measures in [47J, would also provide interesting applications. More generally, 
we expect that the weak Harris theory will be well-suited to the study of many 
MCMC methods in high dimensions, because of its roots in the study of Markov 
processes in infinite dimensional spaces [22 . In contrast, the theory developed 
in [SZJ does not work well for the kind of high dimensional problems that are 
studied here. 

From a methodological perspective, we have demonstrated a particular ap- 
plication of the theory developed in [22J, demonstrating its versatility for the 
analysis of rates of convergence in Markov chains. We have also shown how 
that theory, whose cornerstone is a Wasserstein spectral gap, may usefully be 
extended to study spectral gaps, and resulting sample path properties. These 
observations will be useful in a variety of applications, not just those arising in 
the study of MCMC. 

All our results were presented for separable Hilbert spaces, but in fact all our 
results hold on an arbitrary Banach space by using a Gaussian series (c.f. Section 
3.5 in [3]) instead of the Karhunen-Loeve expansion and the m-independence is 
due to Theorem 3.3.6 in ^6j. 



A Gaussian measures 

In this section we will derive the estimates for Gaussian measure that we needed 
above. In the whole section 7 is Gaussian measure on a Banach space with 
covariance operator C-y. Many estimates for Gaussian measures exploit their 



quadratic-exponential moments (see 3.2). Fernique's Theorem is often used to 



bound integrals over the whole domain. We will use it to derive bounds on an 
integral over the complement of a large ball: 



h{u)d^{u). 

{\\u\\>K} 

We need this to show that V and Vm is d-contracting (see Section 3.2.2| ) . 
Proposition A.l. (Tail estimates) 
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1. For f ■:. 



\\x\\>K 



/(||x||)d7 = f{K)i{\\x\\ >K)+ 7(||x|| > t)rdt 

J K 



2. For f3 small enough and a G there is a constant Ca.p such that for 

exp(a \\u\\)d-i{u) < C„,/3e-'^^'+"-^. 



K > — 

^ 2/3 



{\H\>K} 

Proof. Using integration by parts we get the first part 



||2:||>/<' 



f{\\x\\)dj = /(i^)7(||x|| > i^) + / j{\\x\\>t)fdt. 

J K 



For the second part we set f{x) — exp(ax) in the above and use Lemma A. 2 



exp(a ||a;||)d7 < Fp exp{~l3K^ + aK) + Fpa exp(-/3r + at). 

Jk 

\\x\\>K 

For the integral on the right hand side we use substitution an a result from |40] 

/>oo ^2 poo / (X \ 

exp{-pt^+at) = exp( — exp(^-/3(i-— 

Q,2 

= exp(^) / exp{-s^)ds 

□ 

Lemma A. 2. Let u he distributed according to ^ = A/'(0, C), then we have for 

HM >K)< Fpe-^^\ 

we know that E(e^"""') = Fp < oo. By 



Proof. By Fernique's theorem 
Markov's inequality it follows t 



3.2 



lat 



\u\\>K)<'^^^^^F,e-^'<' 



□ 
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